Branch prediction and simultaneous multithreading
نویسندگان
چکیده
In this paper, we examined the behavior of three of the best performing branch prediction strategies while executing several threads of instructions simultaneously. We studied the impact of the addition of one Return Address Stack per hardware context. We showed that a 12-deep stack per thread is suucient to enhance greatly the accuracy of branch prediction while adding a minimal implementation cost. We explored the behavior of the branch predictors when independant applications are running simultaneously and when the workload is constituted by a parallel program. Our simulations showed that in multiprogramming environment, if the sizes of the tables (PHT/BTB) are proportionnal to the number of active threads, there are very few interactions, be they destructive or constructive. With parallel workloads, we could have expected a beneecial sharing eeect. In fact, it is very dependant of the branch predictors and in the best case, the gains stay very limited. Finally we showed that, for the three predictors, whether in multiprogramming or in parallel processing, if the sizes of the tables are kept small, there is a slight increase of the mispredictions, which is mostly due to an increase of the connicts in the BTB. Les travaux de S ebastien Hily sont en partie nanc es par la r egion Bretagne Pr ediction de branchement et multiot simultan e R esum e : Dans cette etude, nous examinons le comportement de trois strat egies de pr e-diction de branchement, parmi les plus performantes, lorsque plusieurs ots d'instructions sont ex ecut es simultan ement. Nous avons etudi e l'int er^ et de disposer d'une pile d'adresses de retour par contexte. Nous avons ainsi pu montrer qu'une pile de 12 entr ees par ot est suusante pour am eliorer de faa con signiicative la validit e des pr edictions de branchement tout en n'engendrant qu'un faible surco^ ut mat eriel. Nous avons explor e le comportement des m ecanismes de pr ediction quand des applications ind ependantes s'ex ecutent simultan ement et quand les applications sont issues d'un m^ eme programme parall ele. Nos simulations ont montr e que dans un environnement multipro-gramm e, si les tailles des tables (PHT/BTB) sont proportionnelles au nombre de ots actifs, il y a tr es peu d'interactions, aussi bien constructives que destructives. Pour un programme parall ele, nous pouvions attendre un eeet de partage b en eeque. En fait, cela d epend du type de pr …
منابع مشابه
Evaluating Branch Predictors on an SMT Processor
Simultaneous multithreading (SMT) provides significant increases in microprocessor throughput by issuing instructions from multiple threads per clock cycle. SMT can be realized in a wide-issue superscalar with a modest increase in resources, because much of the hardware is shared among the multiple thread contexts. Branch prediction accuracy, a key component of microprocessor performance, can s...
متن کاملA latency-conscious SMT branch prediction architecture
Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...
متن کاملImproving Conditional Branch Prediction on Speculative Multithreading Architectures
Dynamic conditional branch prediction is an indispensable technique for increasing performance in modern processors. However, currently proposed schemes suffer from loss of accuracy when applied to speculative multithreading CMP architectures. In this paper, we quantitatively investigate this problem and present a hardware scheme to improve the prediction accuracy. Evaluation results show that ...
متن کاملTolerating Branch Predictor Latency on SMT
Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...
متن کاملSimultaneous Speculation Scheduling - A Technique for Speculative Dual Path Execution
Commodity microprocessors uniformly apply branch prediction and single path speculative execution to all kinds of program branches and suuer from the high misprediction penalty which is caused by branches with low prediction accuracy and, in particular, by branches that are unpredictable. The Simultaneous Speculation Scheduling (S 3) technique removes such penalties by a combination of compiler...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996